skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Lee, Jason"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available December 9, 2026
  2. A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version ofMVP(Monotonic Value Propagation), an optimistic model-based algorithm proposed by Zhang et al. [82], achieves a regret on the order of (modulo log factors)\begin{equation*} \min \big \lbrace \sqrt {SAH^3K}, \,HK \big \rbrace,\end{equation*}whereSis the number of states,Ais the number of actions,His the horizon length, andKis the total number of episodes. This regret matches the minimax lower bound for the entire range of sample sizeK≥ 1, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield ε-accuracy) of\(\frac{SAH^3}{\varepsilon ^2} \)up to log factor, which is minimax-optimal for the full ε-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in a novel analysis paradigm (based on a new concept called “profiles”) to decouple complicated statistical dependency across the sample trajectories — a long-standing challenge facing the analysis of online RL in the sample-starved regime. 
    more » « less
    Free, publicly-accessible full text available May 2, 2026
  3. Free, publicly-accessible full text available April 30, 2026
  4. Free, publicly-accessible full text available April 24, 2026
  5. Free, publicly-accessible full text available December 1, 2025
  6. Free, publicly-accessible full text available December 15, 2025
  7. Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-dimensional regime. While the information-theoretic sample complexity to recover the hidden direction is lin- ear in the dimension d, we show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require Ω(dk⋆/2) samples, where k⋆ is a “generative” exponent associated with the model that we explicitly characterize. Moreover, we show that this sample complexity is also sufficient, by establishing matching upper bounds using a partial-trace algorithm. Therefore, our results pro- vide evidence of a sharp computational-to-statistical gap (under both the SQ and LDP class) whenever k⋆ > 2. To complete the study, we construct smooth and Lipschitz deterministic target functions with arbitrarily large generative exponents k⋆. 
    more » « less
  8. We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form where is a degree polynomial and is a degree polynomial. This function class generalizes the single-index model, which corresponds to , and is a natural class of functions possessing an underlying hierarchical structure. Our main result shows that for a large subclass of degree polynomials , a three-layer neural network trained via layerwise gradient descent on the square loss learns the target up to vanishing test error in samples and polynomial time. This is a strict improvement over kernel methods, which require samples, as well as existing guarantees for two-layer networks, which require the target function to be low-rank. Our result also generalizes prior works on three-layer neural networks, which were restricted to the case of being a quadratic. When is indeed a quadratic, we achieve the information-theoretically optimal sample complexity , which is an improvement over prior work (Nichani et al., 2023) requiring a sample size of . Our proof proceeds by showing that during the initial stage of training the network performs feature learning to recover the feature with samples. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions. 
    more » « less